Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: correct the implementation of all_schemas() #5236

Merged
merged 2 commits into from
Feb 12, 2023

Conversation

jackwener
Copy link
Member

@jackwener jackwener commented Feb 10, 2023

Which issue does this PR close?

Closes #5192.

Rationale for this change

Currently, all_schema() can return schema of child of child. (due to recursively call all_schema())
But, plan shouldn't get schema of child of child (or child of child of child of child ......).
It should just get info of itself and children.

For example

A join (Project (B.id = 3) -- B)

join shouldn't get schema of B.

So, I refactor this implementation, remove recursion.

What changes are included in this PR?

Just return children schema and itself schema

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Feb 10, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jackwener

This function has aways seemed strange to me -- I wonder if we can work on getting rid of it entirely.

I think every Expr is evaluated in the context of a LogicalPlan and the schema within which to evaluate the Expr is clear:

 (LogicalPlan)  <-- Exprs *within* a logical plan should be interpreted 
                         with respect to the *child output schemas* 
                         (aka the inputs schemas)

I realize you didn't introduce this function, and this change seems to make things better (and all the tests pass) but I still think there is more work to do here somewhere.

Thanks again

datafusion/expr/src/logical_plan/plan.rs Outdated Show resolved Hide resolved
@alamb alamb merged commit 0f95966 into apache:master Feb 12, 2023
@alamb
Copy link
Contributor

alamb commented Feb 12, 2023

Thanks again @jackwener

@ursabot
Copy link

ursabot commented Feb 12, 2023

Benchmark runs are scheduled for baseline = 4eb1a57 and contender = 0f95966. 0f95966 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

all_schema() will get schema of child of child of ....
3 participants